Thresholding
You can adjust the classification_threshold for a guardrail policy to control how strict the system is when classifying content. These thresholds help balance the trade-off between metrics such as recall and false positive rate, enabling you to align policy performance with your enterprise requirements. Note that setting thresholds is optional.
Thresholding for Content Policies
For Content Policies, classification thresholds are floating point values between 0 and 1 that determine the confidence level required to flag or block content. Adjusting these thresholds can be useful for tuning your guardrail to have either stricter or more lenient behavior. Prompts with scores greater than the set threshold are marked as non-compliant.
- A lower threshold (e.g.,
0.3
) means more content will be flagged as violating the policy. - A higher threshold (e.g.,
0.9
) means only very confident violations will be flagged.
Setting Thresholds in DynamoGuard
Thresholds for content policies can be set using the update-threshold endpoint.
import requests
import json
# Replace with your actual policy ID and API key
policy_id = "<POLICY_ID>"
api_key = "<API_KEY>"
url = f"https://api.dynamo.ai/v1/moderation/policy/{policy_id}/update-threshold"
# classification_threshold controls the confidence threshold above which content is flagged
payload = json.dumps({
"classification_threshold": 0.5 #modify this as needed
})
headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {api_key}'
}
response = requests.put(url, headers=headers, data=payload)
if response.status_code == 200:
print("Threshold updated successfully.")
else:
print(f"Failed to update threshold: {response.status_code}, {response.text}")
Thresholding for Hallucination Policies
For Hallucination Policies, thresholds are based on hallucination scores generated by the evaluation model. For hallucination policies, higher scores indicate better compliance. Thresholds indicate the following for each metric:
- Summarization Consistency: The probability that a response is consistent with the prompt
- RAG Hallucination - Input Relevance: The probability that the input is relevant to the context
- RAG Hallucination - Response Relevance: The probability that the model response is relevant to the context or input
Thresholds for hallucination policies can be set during policy creation.